January 17, 2018

Machine Learning (supervised)

Machine Learning vs Programing

Machine Learning vs Statistics

linear \(\Rightarrow\) non-linear

additive \(\Rightarrow\) interactions

theory-driven \(\Rightarrow\) optimization-driven

Black Box Problem

You want to predict wine quality from its physicochemical properties.

Step 1: Find data

Free dataset of red and white variants of the Portuguese “Vinho Verde” wine from the Minho (northwest) region of Portugal.

Three sensory assessors (using blind tastes), which graded the wine in a scale that ranges from 0 (very bad) to 10 (excellent)

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

Step 1: Find data

Step 2: Throw ML on your data

Compare different models with 10x CV - Linear regression model - Decision tree - Random forest

Step 2: Throw ML on your data

learner.id mae.test.mean
regr.ranger 0.4353923
regr.lm 0.5696394
regr.rpart 0.6015873

=> The random forest (ranger) is the best model.

Step 2: Throw ML on your data

Step 3: Profit

Client: “We would love to learn some insights.”

Looking inside the black box

What are the most important features?

TODO: Slide explaining permutation feature importance

What are the most important features?

How do features affect predictions?

TODO: Slide explaining ALE plots

How do features affect predictions?

Method: Accumulated Local Effects

How do features affect predictions?

Method: Accumulated Local Effects

How do features affect predictions?

Interactions between alcohol and volatile acidity?

Rule of thumb for wine quality?

The tree explains 37.36% of the black box prediction variance.

Exceptionally bad wine

TODO: Image of really bad wine

Predicted quality: 3.7628

Shapley Value

TODO: Slide to explain Shapley value

Shapley Value

What needs to change?

TODO: Slide to explain counterfactuals

Counterfactual explanation

How do we get the wine above predicted quality of 5?

##      type fixed.acidity volatile.acidity citric.acid residual.sugar
## 5589  red           7.4            1.185           0           4.25
##      chlorides free.sulfur.dioxide total.sulfur.dioxide density   pH
## 5589     0.097                   5                   14  0.9966 3.63
##      sulphates alcohol quality
## 5589      0.54    10.7       3
## [1] 5.092067
## [1] 5.006533

TODO: Image

  • Decreasing volatile acidity to 0.2 yields predicted quality of 5.09
  • Decreasing volatile acidity to 1.0 and increasing alcohol to 13% yields predicted quality of 5.01

Why interpretability?

What tools do we have?

TODO: Overview (drawing)

  • global vs. local
  • interpretable model vs. post-hoc
  • model-specific vs. model-agnostic
  • data type image vs. text vs. tabular

Interpretable Models

Interpretable Models

Intepretable Model: Linear Regression

Intepretable Model: Decision Tree

Interpretable Model: Decision Rules

IF \(90m^2\leq \text{size} < 110m^2\) AND location \(=\) “good” THEN rent is between 1540 and 1890 EUR

Model-specific methods

Model-specific methods

Model-specific methods

TODO: Example for CNNs

Model-specific methods

TODO: Example for text (RNNs and attention?)

Model-agnostic methods

Model-agnostic methods

Model-agnostic Methods

TODO: Drawing of feature effect

Model-agnostic methods

TODO: Drawing of importance

Model-agnostic methods: Global Surrogate

Model-agnostic methods: Local Surrogate

Example-focused Methods

TODO: Graphic for counterfactuals

Example-focused Methods

TODO: Graphic for prototypes

Interested in learning more?

Backup slides

LRP

LIME for images

LIME for text

Units in Wine dataset

  • fixed acidity g(tartaric acid)/dm3
  • volatile acidity: g(acetric acid/dm3)
  • citric acid: g/dm3
  • residual sugar: g/dm3
  • chlorides: g(sodium chloride)/dm3
  • free sulfur dioxide: mg/dm3
  • total sulfur dioxide: mg/dm3
  • density> g/cm3
  • pH
  • sulphates: g(postassium sulphate) / dm3
  • alcohol vol.%
  • quality based on sensory data (0-10)